Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial Python Repo Manager #97

Merged
merged 8 commits into from
Oct 25, 2023
Merged

Initial Python Repo Manager #97

merged 8 commits into from
Oct 25, 2023

Conversation

clavedeluna
Copy link
Contributor

@clavedeluna clavedeluna commented Oct 25, 2023

Overview

The purpose here is to kickstart the code for analyzing a project directory, its metadata and dependencies.

Description

  • a PythonRepoManager's goal is to take the parent directory passed to codemodder and to analyze that repo's files for existence of any pkg management system or dependency store, such as requirements.txt, project.toml, setup.py, setup.cfg, poetry's more special project.toml, Pipfile, conda's .yaml files.
  • flexibility is key here as each file type is unique and a repo may contain none or many of these.
  • I've added this architecture but codemodder is not running it at this time. I've also copied some code from DependencyManager which will be deduped later on. This will come soon.

There is some key work that needs to happen going forward:

  1. Dudupe / connect PythonRepoManager with DependencyManager. I think ideally DM will be inside PRM, but we shall see as a next step. There's quite a bit of changes going on to DM right now so I'd like to sandbox it
  2. Handle edge cases. I've added quite a few todos to each parser to handle all sorts of edge cases in each file.
  3. Add remaining file parsers: I added a few but we still need: poetry's more special project.toml, Pipfile, conda's .yaml files.
  4. Higher-level testing: maybe have an integration test that runs PRM on known OSS packages and see the results we get. This would add variety to see if we need edge-case handling.
  5. Heuristics for how to handle a parent_directory with 0 or more of these file types. Where will we add dependencies we need to add? How do we handle dependencies declared in multiple files? How do we handle conflicting information? Do we only take the first found file? Lots of discussion we need here.

@codecov
Copy link

codecov bot commented Oct 25, 2023

Codecov Report

Merging #97 (1ee9b1b) into main (e09942c) will increase coverage by 0.28%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #97      +/-   ##
==========================================
+ Coverage   95.22%   95.51%   +0.28%     
==========================================
  Files          51       60       +9     
  Lines        2285     2432     +147     
==========================================
+ Hits         2176     2323     +147     
  Misses        109      109              
Files Coverage Δ
src/codemodder/codemodder.py 98.05% <100.00%> (+0.03%) ⬆️
src/codemodder/context.py 97.29% <100.00%> (+0.11%) ⬆️
src/codemodder/dependency_manager.py 92.45% <ø> (ø)
...demodder/project_analysis/file_parsers/__init__.py 100.00% <100.00%> (ø)
...odder/project_analysis/file_parsers/base_parser.py 100.00% <100.00%> (ø)
...der/project_analysis/file_parsers/package_store.py 100.00% <100.00%> (ø)
...nalysis/file_parsers/pyproject_toml_file_parser.py 100.00% <100.00%> (ø)
...lysis/file_parsers/requirements_txt_file_parser.py 100.00% <100.00%> (ø)
...ect_analysis/file_parsers/setup_cfg_file_parser.py 100.00% <100.00%> (ø)
...ject_analysis/file_parsers/setup_py_file_parser.py 100.00% <100.00%> (ø)
... and 2 more

@clavedeluna clavedeluna marked this pull request as ready for review October 25, 2023 12:23
@clavedeluna
Copy link
Contributor Author

the codecov failure isn't legitimate to me. It mentions lack of coverage on a file I didn't even touch.

Copy link
Contributor

@andrecsilva andrecsilva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor changes. See comments.

)


class SetupCallVisotor(cst.CSTVisitor):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Visotor -> Visitor

def clean_simplestring(node: cst.SimpleString | str) -> str:
if isinstance(node, str):
return node.strip('"')
return node.value.strip('"')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use node.raw_value. It will strip the quotes and prefixes.

@drdavella
Copy link
Member

drdavella commented Oct 25, 2023

poetry's more special project.toml, Pipfile, conda's .yaml files

These are good call-outs. I hope we can defer these until later and just try to handle the more "standard" cases for now.

Heuristics for how to handle a parent_directory with 0 or more of these file types. Where will we add dependencies we need to add? How do we handle dependencies declared in multiple files? How do we handle conflicting information? Do we only take the first found file? Lots of discussion we need here.

A lot of really good points here too. I think the simplest heuristic will be to have some kind of priority order that we define in advance, and make updates only to the first of those files that we detect. We can build out more functionality from that point. It feels like we're paving a new path here, though, and we're on our own to figure it out.

Copy link
Member

@drdavella drdavella left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I'm sure there is going to be a lot of iteration on this and the dependency manager, but this gives us a really good foundation to build upon. Just a few minor comments but otherwise 👍

self.directory = directory
self.dry_run = dry_run
self.verbose = verbose
self._results_by_codemod = {}
self._failures_by_codemod = {}
self.dependencies = {}
self.registry = registry
self.repo_manager = repo_manager
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor thing but we should add the type declaration at the class definition above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you referring to something else that's not already line 43, repo_manager: PythonRepoManager,?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it should be added on line 35 above too.

@property
@abstractmethod
def file_name(self):
... # pragma: no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a tiny thing but I think you can have the ... on the same line as the def which probably makes the # pragma unnecessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh I hope so!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

black doesn't like inline ... so I left it as is

@clavedeluna clavedeluna dismissed andrecsilva’s stale review October 25, 2023 18:42

Minor requests completed

@clavedeluna clavedeluna merged commit 4663b59 into main Oct 25, 2023
11 checks passed
@clavedeluna clavedeluna deleted the repo-manager branch October 25, 2023 18:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants